For this lecture, the learning objectives include:
Create univariate and bivariate plots of data (continuous-continuous & continuous-categorical).
Apply varying basic symbologies for representing data in plots.
Use named and hex colors to better
There is a classic data set in statistics called Fisher’s Iris Data Set looking at 50 measurements of sepal and pedal lengths among three species of Iris 1.
Iris morphology
Sepal.Length Sepal.Width Petal.Length Petal.Width
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
Median :5.800 Median :3.000 Median :4.350 Median :1.300
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
Species
setosa :50
versicolor:50
virginica :50
xlab & ylab: The names attached to both x- and y-axes.
main: The title on top of the graph.
breaks: This controls the way in which the original data are partitioned (e.g., the width of the bars along the x-axis).
n to this option, the data will be partitioned into n bins.col: The color of the bar (not the border)
probability: A flag as either TRUE or FALSE (the default) to have the y-axis scaled by total likelihood of each bins rather than a count of the numbrer of elements in that range.
Call:
density.default(x = sepal_length)
Data: sepal_length (150 obs.); Bandwidth 'bw' = 0.2736
x y
Min. :3.479 Min. :0.0001495
1st Qu.:4.790 1st Qu.:0.0341599
Median :6.100 Median :0.1534105
Mean :6.100 Mean :0.1905934
3rd Qu.:7.410 3rd Qu.:0.3792237
Max. :8.721 Max. :0.3968365
plot()In R, many objects understand how to plot themselves.
Density objects
Analyses (regression, ANOVA, etc)
points, lines, polygons, & rasters
Plotting of two vectors of data, the first position is on the x-axis and the second is on the y-axis.
plot() Options| Parameter | Description |
|---|---|
type |
The kind of plot to show (’p’oint, ’l’ine, ’b’oth, or ’o’ver). A point plot is the default. |
pch |
The character (or symbol) being used to plot. There 26 recognized general characters to use for plotting. The default is pch=1. |
col |
The color of the symbols/lines that are plot. |
cex |
The magnification size of the character being plot. The default is cex=1 and deviation from that will increase (cex > 1) or decrease (0 < cex < 1) the scaling of the symbols. Also works for cex.lab and cex.axis. |
lwd |
The width of any lines in the plot. |
lty |
The type of line to be plot (solid, dashed, etc.) |
bty |
The ‘Box’ type around the plot (“o”, “1”, “7”,“c”,“u”, “]”, and my favorite “n”) |
iris dataset Sepal.Length Sepal.Width Petal.Length Petal.Width
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
Median :5.800 Median :3.000 Median :4.350 Median :1.300
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
Species
setosa :50
versicolor:50
virginica :50
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[38] 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[75] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3
[112] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[149] 3 3
In R, there are 657 different named colors accessable through the function colors().
[1] "gray" "gray" "gray" "gray" "gray" "gray"
[7] "gray" "gray" "gray" "gray" "gray" "gray"
[13] "gray" "gray" "gray" "gray" "gray" "gray"
[19] "gray" "gray" "gray" "gray" "gray" "gray"
[25] "gray" "gray" "gray" "gray" "gray" "gray"
[31] "gray" "gray" "gray" "gray" "gray" "gray"
[37] "gray" "gray" "gray" "gray" "gray" "gray"
[43] "gray" "gray" "gray" "gray" "gray" "gray"
[49] "gray" "gray" "thistle2" "thistle2" "thistle2" "thistle2"
[55] "thistle2" "thistle2" "thistle2" "thistle2" "thistle2" "thistle2"
[61] "thistle2" "thistle2" "thistle2" "thistle2" "thistle2" "thistle2"
[67] "thistle2" "thistle2" "thistle2" "thistle2" "thistle2" "thistle2"
[73] "thistle2" "thistle2" "thistle2" "thistle2" "thistle2" "thistle2"
[79] "thistle2" "thistle2" "thistle2" "thistle2" "thistle2" "thistle2"
[85] "thistle2" "thistle2" "thistle2" "thistle2" "thistle2" "thistle2"
[91] "thistle2" "thistle2" "thistle2" "thistle2" "thistle2" "thistle2"
[97] "thistle2" "thistle2" "thistle2" "thistle2" "green" "green"
[103] "green" "green" "green" "green" "green" "green"
[109] "green" "green" "green" "green" "green" "green"
[115] "green" "green" "green" "green" "green" "green"
[121] "green" "green" "green" "green" "green" "green"
[127] "green" "green" "green" "green" "green" "green"
[133] "green" "green" "green" "green" "green" "green"
[139] "green" "green" "green" "green" "green" "green"
[145] "green" "green" "green" "green" "green" "green"
Color spaces defined by:
In base-16 no less:
0 1 2 3 4 5 6 7 8 9 A B C D E F
So for 2-digits, that is 256 distinct values for each color
00 → FF
Represented triplets of RRGGBB preceded by hashtag
Google up something like “Color Theme Generator” and see what you find.
One I use is: coolors
mu.Setosa <- mean( iris$Sepal.Length[ iris$Species == "setosa" ])
mu.Versicolor <- mean( iris$Sepal.Length[ iris$Species == "versicolor" ])
mu.Virginica <- mean( iris$Sepal.Length[ iris$Species == "virginica" ])
meanSepalLength <- c( mu.Setosa, mu.Versicolor, mu.Virginica )
meanSepalLength[1] 5.006 5.936 6.588
Plotting quantitative data as a magnitude or amount.
iris$Species: setosa
[1] 5.006
------------------------------------------------------------
iris$Species: versicolor
[1] 5.936
------------------------------------------------------------
iris$Species: virginica
[1] 6.588
A boxplot contains a high amount of information content and is appropriate when the groupings on the x-axis are categorical. For each category, the graphical representation includes:
The median value for the raw data
A box indicating the area between the first and third quartile (e.g,. the values enclosing the 25% - 75% of the data). The top and bottoms are often referred to as the hinges of the box.
A notch (if requested), represents confidence around the estimate of the median.
Whiskers extending out to shows \(\pm 1.5 * IQR\) (the Inner Quartile Range)
Any points of the data that extend beyond the whiskers are plot as points.
Pearson's product-moment correlation
data: iris$Sepal.Length and iris$Sepal.Width
t = -1.4403, df = 148, p-value = 0.1519
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.27269325 0.04351158
sample estimates:
cor
-0.1175698